Posterior predictive distribution について

Words near each other

・ Posterior median line
・ Posterior median sulcus
・ Posterior median sulcus of medulla oblongata
・ Posterior median sulcus of spinal cord
・ Posterior meningeal artery
・ Posterior meniscofemoral ligament
・ Posterior nasal apertures
・ Posterior nasal spine
・ Posterior nucleus of hypothalamus
・ Posterior parahippocampal gyrus
・ Posterior parietal cortex
・ Posterior perforated substance
・ Posterior pituitary
・ Posterior pole
・ Posterior polymorphous corneal dystrophy
・ Posterior predictive distribution
・ Posterior probability
・ Posterior proper fasciculus
・ Posterior rami syndrome
・ Posterior ramus of spinal nerve
・ Posterior reversible encephalopathy syndrome
・ Posterior sacrococcygeal ligament
・ Posterior sacroiliac ligament
・ Posterior scrotal arteries
・ Posterior scrotal branches
・ Posterior scrotal nerves
・ Posterior scrotal veins
・ Posterior segment of eyeball
・ Posterior septal branches of sphenopalatine artery
・ Posterior shoulder

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Posterior predictive distribution ：ウィキペディア英語版

Posterior predictive distribution

In statistics, and especially Bayesian statistics, the posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data.〔(【引用サイトリンク】url=http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mcmc_sect034.htm )〕 Described as the distribution that a new i.i.d. data point

\tilde

would have, given a set of ''N'' existing i.i.d. observations

\mathbf = \

. In a frequentist context, this might be derived by computing the maximum likelihood estimate (or some other estimate) of the parameter(s) given the observed data, and then plugging them into the distribution function of the new observations.
However, the concept of posterior predictive distribution is normally used in a Bayesian context, where it makes use of the entire posterior distribution of the parameter(s) given the observed data to yield a probability distribution over an interval rather than simply a point estimate. Specifically, it is computed by marginalising over the parameters, using the posterior distribution:
:

p(\tilde|\mathbf,\alpha) = \int_ p(\tilde|\theta) \, p(\theta|\mathbf,\alpha) \operatorname\!\theta

where

\theta\,

represents the parameter(s) and

\alpha\,

the hyperparameter(s). Any of

\tilde, \theta, \alpha

may be vectors (or equivalently, may stand for multiple parameters).
Note that this is equivalent to the expected value of the distribution of the new data point, when the expectation is taken over the posterior distribution, i.e.:
:

p(\tilde|\mathbf,\alpha) = \mathbb_\Big()

(To get an intuition for this, keep in mind that expected value is a type of average. The predictive probability of seeing a particular value of a new observation will vary depending on the parameters of the distribution of the observation. In this case, we don't know the exact value of the parameters, but we have a posterior distribution over them, that specifies what we believe the parameters to be, given the data we've already seen. Logically, then, to get "the" predictive probability, we should average all of the various predictive probabilities over the different possible parameter values, weighting them according to how strongly we believe in them. This is exactly what this expected value does. Compare this to the approach in frequentist statistics, where a single estimate of the parameters, e.g. a maximum likelihood estimate, would be computed, and this value plugged in. This is equivalent to averaging over a posterior distribution with no variance, i.e. where we are completely certain of the parameter having a single value. The result is weighted too strongly towards the mode of the posterior, and takes no account of other possible values, unlike in the Bayesian approach.)
==Prior vs. posterior predictive distribution==
The prior predictive distribution, in a Bayesian context, is the distribution of a data point marginalized over its prior distribution. That is, if

\tilde \sim F(\tilde|\theta)

and

\theta \sim G(\theta|\alpha)

, then the prior predictive distribution is the corresponding distribution

H(\tilde|\alpha)

, where
:

p_H(\tilde|\alpha) = \int_ p_F(\tilde|\theta) \, p_G(\theta|\alpha) \operatorname\!\theta

Note that this is similar to the posterior predictive distribution except that the marginalization (or equivalently, expectation) is taken with respect to the prior distribution instead of the posterior distribution.
Furthermore, if the prior distribution

G(\theta|\alpha)

is a conjugate prior, then the posterior predictive distribution will belong to the same family of distributions as the prior predictive distribution. This is easy to see. If the prior distribution

G(\theta|\alpha)

is conjugate, then
:

p(\theta|\mathbf,\alpha) = p_G(\theta|\alpha'),

i.e. the posterior distribution also belongs to

G(\theta|\alpha),

but simply with a different parameter

\alpha'

instead of the original parameter

\alpha .

Then,
:

\beginp(\tilde|\mathbf,\alpha) & = \int_ p_F(\tilde|\theta) \, p(\theta|\mathbf,\alpha) \operatorname\!\theta \\& = \int_ p_F(\tilde|\theta) \, p_G(\theta|\alpha') \operatorname\!\theta \\& = p_H(\tilde|\alpha')\end

Hence, the posterior predictive distribution follows the same distribution ''H'' as the prior predictive distribution, but with the posterior values of the hyperparameters substituted for the prior ones.
The prior predictive distribution is in the form of a compound distribution, and in fact is often used to ''define'' a compound distribution, because of the lack of any complicating factors such as the dependence on the data

\mathbf

and the issue of conjugacy. For example, the Student's t-distribution can be ''defined'' as the prior predictive distribution of a normal distribution with known mean ''μ'' but unknown variance ''σ_x²'', with a conjugate prior scaled-inverse-chi-squared distribution placed on ''σ_x²'', with hyperparameters ''ν'' and ''σ²''. The resulting compound distribution

t(x|\mu,\nu,\sigma^2)

is indeed a non-standardized Student's t-distribution, and follows one of the two most common parameterizations of this distribution. Then, the corresponding posterior predictive distribution would again be Student's t, with the updated hyperparameters

\nu', '

that appear in the posterior distribution also directly appearing in the posterior predictive distribution.
Note in some cases that the appropriate compound distribution is defined using a different parameterization than the one that would be most natural for the predictive distributions in the current problem at hand. Often this results because the prior distribution used to define the compound distribution is different from the one used in the current problem. For example, as indicated above, the Student's t-distribution was defined in terms of a scaled-inverse-chi-squared distribution placed on the variance. However, it is more common to use an inverse gamma distribution as the conjugate prior in this situation. The two are in fact equivalent except for parameterization; hence, the Student's t-distribution can still be used for either predictive distribution, but the hyperparameters must be reparameterized before being plugged in.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Posterior predictive distribution」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース